Although biological disease-modifying antirheumatic drugs (bDMARDs) are significantly cheaper now that biosimilar agents are available, they still represent a significant cost to health services worldwide, particularly since their prescription has only increased since the thresholds for commencing this treatment have been lowered. For example, in 2017/8, the total cost of adalimumab to the NHS in England was £494.5m and the total cost of etanercept was £219.8m, making them the first and third most costly single agents, respectively.1 However, up to 40% of patients receiving bDMARDs for rheumatoid arthritis (RA) still have uncontrolled disease activity due to both primary (the drug never achieves efficacy) and secondary (the drug demonstrates efficacy initially, then loses efficacy over time) inefficacy.2,3
Furthermore, it remains unclear whether bDMARD efficacy is influenced by seropositivity to anticitrullinated peptide antibodies (ACPA) because the understanding of RA pathogenesis at the molecular level remains incomplete. Epidemiological data have shown that seropositivity to ACPA influences treatment response to bDMARDs.4 However, the underlying biological basis of how patients with different RA endotypes respond differently to therapeutic agents with different modes of action has not been elucidated.4
Despite the explosion of molecular and computational algorithmic methods for enhancing and widening our understanding of RA pathogenesis and pathophysiology, it is still not possible to predict which patients will respond to which drugs at the first attempt. Before this important clinical step can be achieved, study design and analytical approaches need to be developed and standardised to improve the generalisability of study findings.
It is hoped that using a patient’s biology in the form of molecular biomarkers to treat RA will enable more informed prescribing. This molecular approach to defining signatures of treatment response in patients with RA has led to the establishment of large biobanks of patient blood and synovial biopsy samples and clinical information, such as the Pathobiology of Early Arthritis Cohort biobank in the UK.5 This study identified synovial endotypes corresponding to treatment response to conventional synthetic DMARDs. Nonetheless, future work may need to relate synovial signatures to systemic blood biomarkers to translate findings to the clinic room. Such studies have been supplemented by the formation of large collaborative networks such as the Maximizing Therapeutic Utility in RA (MATURA) programme,6 the RA-MAP Consortium7 and the Innovative Medicines Initiative-funded Taxonomy, Treatments, Targets and Remission consortium.8
However, challenges remain in precision medicine research in rheumatoid arthritis. Which are the optimal analytical methods to use data to their maximum potential while ensuring external validity? Which clinical outcome variable should be used for drug response prediction? Which biomarkers should be measured, and of what provenance?
Some progress has been made, with a number of recent studies using integrative multiomics techniques beginning to populate the field of RA. For example, Tasaki et al. used this approach to define molecular signatures of treatment response in patients with RA compared with healthy controls.9 They used a combination of proteomics, transcriptomics and immunophenotyping to show that biomarker profiles analogous to molecular health were achieved following successful treatment in a subset of patients. However, findings for specific drugs are difficult to interpret from this study because the analysis of the study subjects was pooled, despite heterogeneous treatment across the study cohort: subjects received either methotrexate, infliximab or tocilizumab (i.e. drugs with different modes of action).
Studies in RA have largely employed modest sample numbers and have pooled patients on different medications with different modes of action into the same analyses.9–14 Consequently, precision prescribing recommendations cannot be made. Furthermore, the majority of multiomics studies that have been published thus far were cross-sectional and often did not use healthy controls. Future inception cohort studies with improved study design may yield more biologically relevant and clinically applicable findings.
One study that has successfully advanced stratified medicine approaches is the Accelerate Information of Molecular Signatures study.15 This study sought to identify and test the clinical utility of a blood-based molecular signature as a classifier of treatment response to tumour necrosis factor inhibitors (TNFi), a class of bDMARDs. The molecular signature response classifier developed during the study had a high negative predictive value of treatment response to TNFi. Therefore, this approach could be a future direction for predictive, precision medicine in the treatment of patients with RA.
Another challenge related to data analysis is the resampling and reuse of data. The reuse of sample data could lead to potential reporting bias due to data leakage. A standard machine learning technique is to divide data randomly into multiple groups and carry out analysis on each group, then pool results from each group, in order to avoid bias. However, serial resampling without validation in an independent cohort and/or including patients from training cohorts in subsequent validation cohorts (as in a study by Tao et al.12) can lead to the overoptimistic reporting of performance metrics such as receiver operating characteristic curves and accuracy. Strong efforts must be made to avoid such reuse of data to prevent inaccurate overreporting of model performance. As multiomics studies increase in number, so too do the number of features being tested. Using robust and reproducible strategies to reduce the dimensionality to identify important features, whilst removing less relevant predictors, are essential for optimal model performance and interpretability.
Discovery biomarker studies should pay careful attention to the chosen measure of treatment response. In cancer, a simple binary outcome measure is most commonly used (i.e. the presence or absence of malignancy). In RA, however, outcome measures are tied to composite disease activity scores such as the Disease Activity Score in 28 Joints and the Clinical Disease Activity Index.16,17 These two activity measures are similar because they investigate patient-reported outcome measures (PROMs), such as global health, which may be influenced by factors other than active RA, such as sleep, mood and environment.
Hensor et al. showed that a two-component Disease Activity Score in 28 Joints, which omitted tender joint count and patient global health scores, has improved association with radiological outcomes (ultrasound and plain radiograph).18 Therefore, a move towards multiomics predictive studies using an outcome measure with less “noise” may lead to more reproducible and biologically plausible findings. However, PROMs should still be incorporated into day-to-day clinical care so that patient experience and symptom control are at the forefront of management in the clinic room. Outcome measures stratified into biological and PROM categories could be the future of RA management.19
Are there other ways to improve the outcome measure of interest? As has been shown,18 there are other possibilities for defining treatment response that deviate from the status quo. Future directions could include a more sophisticated use of artificial intelligence techniques and imaging to aid the definition of active RA at the joint. For example, Yasaka et al. successfully used deep learning with a convolutional neural network to predict vertebral bone mineral density from abdominal computed tomography scans, avoiding further imaging with dual-energy X-ray absorptiometry in patients who had already received radiation exposure.20 Similar analytical techniques could be developed to detect synovitis, small effusions or even bone lesions to optimise the time taken for image interpretation.
Furthermore, granular multiomics studies could use detailed molecular acquisition techniques coupled with interpretable machine learning methods to ascertain whether there are cell or tissue surrogates for joint synovitis. The ideal biomarker(s) would come from blood sampling, as this is quicker than the imaging of multiple joints and less invasive than sampling other tissues that might be affected, such as the synovium. However, sampling from blood alone might not be sufficient for developing robust predictions, even if the outcome measure can be optimised, thus demonstrating the need for tissue measurements or paired blood and tissue measures across several omic data modalities.
Future studies should also aim to go beyond predicting treatment response and provide an additional mechanistic understanding of the pathobiology of RA and how different RA disease endotypes are defined. Focus on physiological networks already identified, such as synovial complement proteins21 and and Fcγ receptors (in a number of tissues),22 may help to yield more significant results21. Interpretable machine learning techniques can aid the visualisation of significant variables and the identification of plausible biological pathways and their mechanistic understanding.23 This approach has been successful when applied to a multiomics dataset of proteomics, metabolomics and lipidomics samples collected from a trial cohort of patients who had suffered from traumatic brain injury and were treated with thawed plasma.24 This study used the Essential Regression21 interpretable machine learning technique25 to identify multiple latent factors associated with molecular differences in components of the clotting cascade, leading to the hypothesis that patients with traumatic brain injury have an altered biological response compared to those without traumatic brain injury.24
Interpretable machine learning approaches can potentially overcome the problem of multiomics data dimensionality (i.e. a huge number of molecular features are measured in a relatively small number of patient samples), go beyond predicting treatment response by causally linking latent factors within multiomics data to clinical endotypes (e.g. treatment response) and identify a sparse set of individual biomarkers for further validation and functional studies.
The integration of multiomics techniques and the use of advanced machine learning analytical methods are in their infancy, and the publication of these studies in the field of RA and wider rheumatology research is expected to increase over the coming years, particularly as technology continues to develop and improve. In time, the hope is that findings can be validated in independent populations and tested in early-phase trials so that scientific advances can be translated to routine clinical practice for the benefit of clinicians and, most importantly, patients.